Motion reconstruction method from inter-frame feature correspondences of a singular video stream using a motion library

ABSTRACT

The present invention relates to the motion reconstruction method from inter-frame feature correspondences of a singular video stream using a motion library.  
     The motion library is constructed through a motion capture system which is becoming more actively used or skilled animators.  
     The present invention comprises the detailed information on the whole processes from images to motions. These processes are very simple to implement and straight forward.  
     The total movement of a motion can be effectively inferenced and this technology can directly used for various fields in which human motions are produced from the images.

BACKGROUND OF THE INVENTION

[0001] The present invention relates to the motion reconstruction methodfrom inter-frame feature correspondences. More particularly, theinvention relates to the method of obtaining 3D motion information froma singular video stream by transforming the similar motions retrievedfrom a motion library and fitting them into the input image information.

[0002] Recently, the production of a virtual character animation using amotion capture technology is becoming very active.

[0003] The advantage of using the motion capture technology is that realmotions can be produced in a fast and efficient way in comparison to theother conventional animation production methods.

[0004] Since the captured motion clips are short and generally relatedto particular characters or environments, there has been a continuesdevelopment of various types of editing tools which recycle the capturedmotions for the production of new animations.

[0005] Based on these developed tools, animators can appropriately usethe captured motions according to various requirement of the virtualcharacters and environments.

[0006] The monocular images captured by a camera is the most standardmedium for storing the motions of a human and to date many researchershave been conducting experiments for various purposes in order toextract human motions from the monocular images.

[0007] There has been continues research activities on automaticreconstruction of motions from images based on an image analysistechnology. In general, these researches rely on a probabilistic modelfor ascertaining the position of an articulated body.

[0008] Among the examples, Azarbayejani, et al (C. Wren, A.Azarbayejani, T. Darrel and A. Pentland. Pfinder: Real-time Tracking ofthe human body. IEEE Trans. Pattern Analysis and Machine Intelligence,1991) proposed a method of real time tracking of a human body from theimages obtained from one or plurality of cameras.

[0009] The above paper classifies a human body into a number of blobsand 3D location of a blob is tracked by the probabilistic model.

[0010] Bregler and Malik (C. Bregler and J. Malik. Estimation andtracking kinematics chains. In Proc. IEEE Conf. Computer Vision andPattern Recognition, 1998.) expressed the kinematics of amulti-articulated body in terms of a twist and exponential basis. Basedon this expression, the motions of a human body are inferenced from theimages obtained from one or plurality of cameras.

[0011] Sminchisescu and Triggs (C. Sminchisescu and B. Triggs.Covariance scaled sampling for monocular 3D body tracking. In Proc. IEEEConf. on Computer Vision and Pattern Recognition, 2001.) brought theattention to the difficulty of reconstructing images based 3Dinformation due to the vagueness and occlusion. They attempted to obtaina nearest solution using an optimization method.

[0012] In order to avoid local minimum solutions during an optimization,the paper uses the covariance-scaled sampling method in conjunction witha numerical optimization method.

[0013] Also, it concentrated on some effective methods which are alreadyknown to some researchers for extracting a previously known 3Dinformation from 3D feature information.

[0014] For example, Zheng and Suezaki (J. Y. Zheng and S. Suezaki. Amodel based approach in extracting and generating human motion.Proceedings of Fouteenth International Conference on PatternRecognition, 1998.) proposed a model based method of capturing themotions of a multi-articulated body from monocular images.

[0015] The above paper disclosed the production method of 3D motionsfrom total images by designating a number of key frames, obtaining 3Dinformation from the key frames and storing them.

[0016] Rehg, et al (J. M. Rehg and T. Kanade. Visual tracking of highDOF articulated structures: an application to human hand tracking.European Conf. on Computer Vision, 1994.) attempted to reconstruct 3Dinformation using a probabilistic approach that includes a kinematicsmodel and limiting conditions of articulation angle as well as otherlimiting conditions.

[0017] Kakadiaris and Metaxas (I. Kakadiaris and D. Metaxas. Model-basedestimation of 3D human motion with occlusion based on activemulti-viewpoint selection. In Proc. IEEE Conf. Computer Vision andPattern Recognition, 1996.) disclosed the method of obtaining a ratiofor the given model from one piece of image using anthropmetryinformation. Taylor (C. J. Taylor. Reconstruction of articulated objectsfrom point correspondences in a single uncalibrated image. ComputerVision and Image Understanding, 2000.) disclosed the method of obtaininga detailed 3D information based on a previously known model using theforeshortening phenomenon which occurs from an image.

[0018] Liebowitz and Carlsson(D. Liebowitz and S. Carlsson. Uncalibratedmotion capture exploiting articulated structure constraints. In Proc.8th International Conference on Computer Vision, 2001.) disclosed themethod of obtaining a detailed dynamic information of amulti-articulated body based on the images obtained from a plurality ofuncalibrated cameras.

[0019] The above paper uses a limiting condition which states that thebody ratio of a multi-articulated body is constant with respect to time.

[0020] Recently, a number of fresh attempts have appeared which shed anew light into the problem of motion construction.

[0021] For example, Howe, et al (N. R. Howe, M. E. Leventon, and W. T.Freeman. Bayesian reconstruction of 3D human motion from single-cameravideo. Cambridge Research Laboratory TR-CRL-99-37, 1999.) attempted tosolve the problem of reconstructing 3D motions from monocular imagesusing the relationship between 2D features formed by training and 3Dpositions.

[0022] The above paper claimed that the loss of depth information can bereproduced by using the above relationships.

[0023] Sidenbladh, et al (H. Sidenbladh, M. J. Black, and D. J. Fleet.Stochastic tracking of 3D human figures using 2D image motion. EuropeanConference on Computer Vision, 2000.) obtained the patterns of human'swalking motion through a training. By using these patterns, an attemptedhas made to reconstruct an arbitrary walking motion.

[0024] The common characteristic for this type of problems is that itregards the 3D motion tracking problems as an inference problem andapproached them accordingly.

SUMMARY OF THE INVENTION

[0025] The above mentioned motion capturing process requires not onlyexpensive hardware apparatus but also a performer who performs carefullyaccording to the scenario under the given circumstances. This motioncapturing technology can not be used for the purpose obtaining motionsin natural and real situations such as a sports match or ball dance.

[0026] Also, the problem of reconstructing 3D motions from monocularimages is very technologically demanding even for the state of the arttoday. Most of the existing researches are suitable for the purpose ofreconstructing dynamic motions from the monocular images taken fromvarious circumstances. Also the quality of 3D motions are not suitablefor the production of animations.

[0027] Up to now no effective methods have been discovered for obtainingvery dynamic and real motions of a human like articulated body whichcomprises more than 40 degrees of freedom from monocular images.

[0028] The object of the present invention is to provide the motionreconstruction method from inter-frame feature correspondences of asingle video stream using a motion library in which the process ofmaking motion from images involves a simple and straight forward timetransformation step, articulation angle reconstruction step and toparticulation position inferencing step.

BRIEF DESCRIPTION OF THE DRAWINGS

[0029]FIG. 1 shows a block diagram of the reconstruction according tothe present invention.

[0030]FIG. 2 shows a time transformation graph of the reference motionsusing keytimes according to the present invention.

[0031]FIG. 3 shows a diagram which shows a time transformation of morethan two articulations according to the present invention.

[0032]FIG. 4 shows a block diagram for ascertaining the articulationangles and camera parameters according to the present invention.

[0033]FIG. 5 shows the process of motion smoothing against thereconstructed articulation angles according to the present invention.

[0034]FIG. 6 is a diagram which shows some problems that might occurwhen the top articulation information of the reference motions are used.

[0035]FIG. 7 is a diagram which compares the center of gravitytrajectory when the dynamic characteristics are not considered.

[0036]FIG. 8 shows the reconstruction of a shooting motion according tothe present invention.

[0037]FIG. 9 shows the reconstruction of a heading motion according tothe present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0038] Hereinafter, preferred embodiments of the present invention willbe described in detail with reference to the accompanying drawings.

[0039] First of all, the technological principle of the presentinvention will be explained prior to a detailed explanation of thepresent invention. In order to reconstruct the motions of a human, adetailed 3D position information of a multi-articulated body should beobtained from 2D feature points such as articular position.

[0040] The biggest problem in the motion reconstruction from monocularimages is the loss of depth information which occurs as 3D informationis projected to 2D images.

[0041] Moreover, since there is no focus information of the camera usedfor capturing images and the camera itself might move with time, it isvery difficult to obtain an absolute 3D trajectory of the performer inthe video using only the image information.

[0042] The present invention proposes a new motion reconstruction methodbased on monocular image using a motion library.

[0043] It is assumed that the model information for a performer in animage, more specifically, the relative size ratio of each articulationand 2D feature points against the total images are known. Based on theseinformation, a reconstruction is carried out by transforming the similarmotions retrieved from a motion library and fitting them into the imageinformation.

[0044] This approach requires the construction of a motion library whichcontains many motions similar to the motions to be reconstructed.

[0045] Theoretically, it is difficult to maintain and to perform asearch through all the diversities of human motions in a library, hence,the motions associated with the objective of the present invention areassumed to be known.

[0046] For example, if various shooting motions of a soccer player wereto be obtained, then a library of shooting motions should be constructedby obtaining a number of basic shooting patterns using a motioncapturing method. Using this motion library, diverse and dynamicshooting motions in a real game can be reconstructed.

[0047] The followings are the detailed explanation of the presentinvention.

[0048] 1. Constitution

[0049]FIG. 1 is a block diagram of the construction according to thepresent invention. One motion which is similar to the user's objectivemotion is selected from a motion library 10.

[0050] By referencing from the selected motions, the time correspondencerelationship between the images and reference motions is established bytransforming the reference motions along the time axis (S100).

[0051] Afterwards, the motion reconstruction process is carried outthrough a spatial transformation of the transformed reference motionsusing the 2D feature points obtained from the images.

[0052] Since the input images are assumed to be obtained from anuncalibrated camera which moves arbitrarily, the spatial location of theperformer can not be obtained directly from these images.

[0053] Hence, after a relative location relationship is establishedbetween the locations obtained, when the articulation angle is projectedto a 3D multi-articulated body, and 2D feature points which are given asan input, a method which produces appropriate top articulation movementsare selected using the consecutive position information obtained asabove, kinematics information from a number of images and dynamicinformation obtained from the reference motions (S300).

[0054] In order to establish a time correspondence relationship betweenthe input images and reference motions, first of all, a number ofkeytimes, in which a mutual reaction between a performer and the givenenvironment exists, are assigned through a conversation method.

[0055] When the reference motions are transformed along the time axisbased on the assigned keytimes, a process, which parameterizes thereference motions which makes the keytimes of the reference motions tocoincide to the assigned keytimes in the images, is carried out.

[0056] These transformed reference motions are assigned as an initialestimated value of the objective motions.

[0057] In order to obtain a relative 3D position information of a toparticulation position, a kinematical limiting conditions should be setin order to make the location of projected articulations of a 3Dmulti-articulated body to coincide to 2D feature points.

[0058] Since several 3D positions can exist which satisfy the abovegeneral limiting conditions, an objective function, which can select themost similar motions to the reference motions, is used.

[0059] The undetermined variables such as camera parameters andarticulation angles can effectively and simultaneously be obtained usingthe present invention.

[0060] Also, in order to maintain the softness of reconstructed motions,the noise removal process is carried out by using a motion displacementmap and multi-level B-spline interpolation (S. Lee, G. Wolberg, and S.Y. Shin. Scattered data interpolation with multilevel B-splines. IEEETrans. Visualization and Computer Graphics, 1997.).

[0061] After calculating the difference between the reference motionsand reconstructed articulation angles for each of the frames, amultilevel B-spline interpolation nearest to this difference isobtained. This value is selected as a motion displacement map. Usingthis motion displacement map, the reference motions are transformed toproduce the final motions.

[0062] Finally, the reconstruction process is completed by inferencingthe trajectory of the top articulation angle.

[0063] This reconstruction process is carried out by classifying themotions into two different types.

[0064] The first case is when a mutual reaction between a performer andits surrounding environment exists. In this case, the mutual reactionwith the environment which appears in the images by transforming the toparticulation position information of the reference motions, morespecifically, the top articulation position that satisfies thekinematical relationship is transformed.

[0065] Even in this process, the final motions are produced bycalculating the top articulation position displacement for each framewith a mutual reaction and by producing a soft multilevel B-splineinterpolation curve which connects all the displacement values in thetotal frame.

[0066] The second case is when there is no mutual reaction. In thiscase, the top articulation position trajectory of the final motions areproduced by using the dynamic characteristics of the reference motions.

[0067] Especially, if a multi-articulated body is regarded as acollection of rigid bodies, the trajectory of the center of gravityunder the influence of gravity only without a mutual reaction can beshown to be a soft parabola.

[0068] Using these observed information, the final motions are producedby making the center of gravity trajectory of the objective motions tobe similar to those of the reference motions.

[0069] 2. Time Transformation (S100)

[0070] The motion is a function which determines the position of amulti-articulated body with respect to time.

[0071] In case of a multi-articulated body with n number ofarticulations, arbitrary motions can be represented by mathematicalequation 1.

m(t)=(p ₁(t),q ₁(t), . . . ,q _(n)(t))^(T)   [Mathematical Equation 1]

[0072] Here, p₁(t) ∈R³ and q₁(t) ∈S³ represent the location anddirection of the top articulation position and q_(i)(t) ∈S³ representthe direction information of i(2≦i≦n)^(th) articulation.

[0073] Also, the collection of feature points fed by the images as aninput are represented by mathematical equation 2.

{overscore (m)}(t)=({overscore (p)} ₁(t), . . . ,{overscore (p)}_(n)(t))^(T)   [Mathematical Equation 2]

[0074] Here, {overscore (p)}₁(t) ∈R² represent the image location ati(1≦i≦n)^(th) projected articulation and {overscore (m)}(t) means theprojected information of an objective motion m(t) at time t.

[0075] Here, in order to coincide the given reference motion m(t) withthe feature points {overscore (m)}(t), the reference motion should beparameterized again with the time axis.

[0076] In general, the dynamic time warping method is used forcontrolling the time relationship between two different non-continuossample signals.

[0077] However, using the dynamic time warping method for establishingthe time relationship between two different signals without the presentcamera information is very difficult since the two signals have twodifferent dimensions.

[0078] In order to resolve this problem, the present invention utilizesa collection of keytimes which are established by the users in the inputimages and reference motions.

[0079] The keytimes are a collection of points on which the user startsor finishes mutual reactions with the surrounding environment in theimages and reference motions.

[0080] For example, with respect to a walking motion, the keytimes willbe the moments when the person's foot touches the ground and leaves theground. With respect to a shooting motion in a soccer match, the mostimportant keytime will be when the player's foot strikes the ball.

[0081] These keytimes can easily be established through not only in theinput images but also in the reference motions.

[0082] After establishment of keytimes, the process of producing motionswhich most closely coincide with the collection of keytimes for theinput images should be carried out by transforming the reference motionswith the time axis again.

[0083] With respect to ith articulation, if we assume that each ofK_(i)={t_(i,1), . . . . ,t_(i,c)} and {overscore (K)}_(i)={{overscore(t)}_(i,1), . . . . , {overscore (t)}_(i,c)} are collection of keytimesdefined in the input images and reference motions, the parameter of thereference motions m(t) are transformed by mathematical equation 3 inorder to transform the reference motions in such a way that Ki and{overscore (K)}_(i) coincide for all i.

[0084] [Mathematical Equation 3] $\begin{matrix}{{{t^{\prime}(t)} = {{\overset{\_}{t}}_{i,k} + {\left( \frac{{\overset{\_}{t}}_{i,{k + 1}} - {\overset{\_}{t}}_{i,k}}{t_{i,{k + 1}} - t_{i,k}} \right)\left( {t - t_{i,k}} \right)}}},{t_{i,k} \leq t \leq t_{i,{k + 1}}}} & \left\lbrack {{Mathematical}\quad {Equation}\quad 3} \right\rbrack\end{matrix}$

[0085] Here, t and t^(I) represent the original parameter andtransformed parameter of the reference motions respectively.

[0086]FIG. 2 shows the same process in case of a single articulation andthe left represents the keytimes and rotation values of articulations inthe reference motions and the right represents the keytimes and rotationvalues of articulations in the images.

[0087] The each curve represented in FIG. 2 is one of the elements ofunit quaternion that shows the direction information of articulationswith respect to time. In general, since a human type multi-articulatedbody possesses a number of articulations with mutual reactions, thisprocess is repetitively carried out for all articulations with thedefined mutual reactions.

[0088]FIG. 3 is a diagram which shows a time transformation of more thantwo articulations. A different keytime is represented using a bar withdifferent brightness.

[0089] The left foot (bright color) and right foot (dark color) arerepresented and each of the top line and bottom line represent an axisfor images and time respectively.

[0090] According to the order of the sequence, it represents thekeytimes for the images and reference motions→resampling against thetotal time→resampling against the right foot.

[0091] 3. Reconstruction of Articulation Angle

[0092] 3-1 Kinematically Limiting Conditions

[0093] In all frames, the projected location of a multi-articulated bodyto the image face should coincide with the feature points of the inputimages.

[0094] It is difficult to directly determine the trajectory of toparticulation position from the images since the input images are notobtained from an uncalibrated camera with an arbitrary trajectory and areferenced body, from which the location of performer can be calculated,does not always exist.

[0095] According to the present invention, the articulation angles aredetermined through a process that involves matching the relativelocation of articulations in reference to the top articulation position.

[0096] In this case, the position of a multi-articulated bodyx(t)=m|p1(t)=0₃ becomes x(t)=(0₃,q₁(t), . . . ,q_(n)(t))^(T).

[0097] In order to provide an articulation position of themulti-articulated body which is projected to the images, a camera modelis required.

[0098] In general, the camera model with a full degree of freedom can berepresented by mathematical equation 4.

c=(t_(x),t_(y),t_(z),r_(x),r_(y),r_(z),α,f)   [Mathematical Equation 4]

[0099] Here, each of (t_(x), t_(y),t_(z)) and (r_(x), r_(y), r_(z))represent the location and direction of a camera and α and f representthe vertical and horizontal ratio and focal distance respectively.

[0100] If the camera is assume to be facing the top articulationposition of a performer, then the coordination of the camera can berepresented in terms of the distance between the performer and camera,and direction information.

[0101] A simplified model for the camera used in the present inventioncan be represent as mathematical equation 5.

c=(r_(x),r_(y),r_(z),γ)   [Mathematical Equation 5]

[0102] Here, γ is a ratio between the focal distance and distancebetween a performer and camera.

[0103] The simplified model in mathematical equation 5 is sufficient forthe purpose of coinciding the feature points of the input images andprojected articulation location by controlling the relative location ofarticulations in reference to the top articulation position.

[0104] The method of inferencing the top articulation trajectory inorder to obtain the total movement of the performer will be explainedlater.

[0105] The kinematically limiting conditions of a multi-articulated bodybased on the camera parameters can be represented by mathematicalequation 6.

∥{overscore (p)} _(i)(t)−P _(c) f _(i)(x(t))∥=0   [Mathematical Equation6]

[0106] Here, fi(•) is a forward kinematics function for i^(th)articulation and Pc is a projection transformation series consisted ofcamera parameter c.

[0107] 3-2 Objective Function

[0108] Since the number of kinematically limiting conditions is lessthan the number of degree of freedom in a multi-articulated body, thereare a number of positions in the multi-articulated body that satisfiesthe kinematically limiting conditions.

[0109] DiFranco, et al (D. E. Difranco, T. J. Cham, and J. M. Rehg.Recovery of 3D articulated motion from 2D correspondences. CambridgeResearch Laboratory TR-CRL-99-7, 1999.) proposed that vagueness of depthinformation can be resolved by introducing additional limitingconditions such as a motion of articulation angle.

[0110] However, there are still many indeterministic variables forobtaining 3D information from 2D information even with the additionallimiting conditions.

[0111] According to the present invention, in order to obtain the bestposition information a reference motion information is used.

[0112] Since the reference motions are similar to an objective motionsin the images, by selecting an articulation angle with the leastdifference ensures naturalness of the final motions.

[0113] Hence, a position x(t) which minimizes an objective function inmathematical equation 7 should be ascertained.

g(x(t))=dist(x ^(r) ,x(t))   [Mathematical Equation 7]

[0114] Here, x(t) is a position information consisted of referencemotions and articulation angles with respect to time t and dist(•) is afunction which specifies the difference between two differentdirections.

[0115] [Mathematical Equation 8] $\begin{matrix}{{{dist}\left( {{x^{r}(t)},{x(t)}} \right)} = {\sum\limits_{i = 1}^{n}{{\ln\left( {\left( {q_{i}(t)} \right)^{- 1}{q_{i}^{r}(t)}} \right.}^{2}}}} & \left\lbrack {{Mathematical}\quad {Equation}\quad 8} \right\rbrack\end{matrix}$

[0116] Here, ln(•) is a logarithm map against unit quaternion (K.Shoemake. Animating rotation with quaternion curves. Computer Graphics(Proceedings of SIGGRAPH 85), 1985.)

[0117] 3-3 Method of Obtaining the Solutions (ArticulationReconfiguration)

[0118] The problem of obtaining articulation angles can be summarized asobtaining a position x(t) which minimizes the objective function g(•)under the given kinematical limiting conditions.

[0119] If mathematical equation 8 is changed into a form without thelimiting conditions, then the equation becomes mathematical equation 9.

[0120] [Mathematical Equation 9] $\begin{matrix}{{g\left( {x(t)} \right)} = {\sum\limits_{i = 1}^{n}\left( {\left. {{{\overset{\_}{p}}_{i}(t)} - {P_{c}{f_{i}\left( {x(t)} \right.}^{2}}} \right) + {\omega \quad {{dist}\left( {{x^{r}(t)},{x(t)}} \right)}}} \right.}} & \left\lbrack {{Mathematical}\quad {Equation}\quad 9} \right\rbrack\end{matrix}$

[0121] Here, ω is a weight value for adding two different objectivefunctions.

[0122] The first term of mathematical equation 9 shows an articulationangle difference when the feature points of the input images areprojected to a 3D multi-articulated body and the second term shows thedifference between the final articulation angles and the articulationangles of the reference motions.

[0123] In order to solve this minimization problem, the conjugategradient method is used (W. H. Press, Saul A. Teukolsky, W. T.Vetterling, and B. P. Flannery. Numerical recipes in C: The art ofscientific computing. Cambridge University Press, 1992.).

[0124] The reason for the difficulty in solving mathematical equation 9is that a camera information e is also included as a single equationalong with the detailed information x(t).

[0125] A standard optimization problem such as the conjugate gradientmethod exhibits the best quality when an objective function is in theform of a quadratic equation.

[0126] Hence, according to the present invention, a good convergence isobtained by treating two different parameters independently during theoptimization process.

[0127] The one of the two parameters is fixed while the other is beingderived. FIG. 4 is a block diagram which shows the repeated calculationmethod for ascertaining articulation angles and camera parameters.

[0128] The optimization process using a numerical analysis, the firstestimated value is very important for getting a good result.

[0129] The reconstruction process according to the present invention atime transformed reference motions can always be a good first estimatedvalue for getting the objective articulation angles.

[0130] Since the camera model according to the present invention hasfour parameters, the first estimated value of camera parameters can becalculated, first, selecting 3 points which exist on a same plane at thehip of an articulated body, second, conducting a least squareapproximation of the three points.

[0131] 3-4 Motion Smoothing

[0132] In practice, it is very difficult to accurately track thelocations of articulations from the streamed video without anydisplaying apparatus attached to the performer.

[0133] Also, the values of the reconstructed articulation angles havejerkiness due to the input feature points that contains noisy.

[0134] In order to reduce this jerkiness, a motion displacement map isused in conjunction with a multi-level B-spline interpolation.

[0135] The motion displacement map represents the difference between twodifferent motions.

[0136] The motion displacement map d(t)=x(t) θx^(r)(t) which shows thedifference between the position information xr(t) and reconstructedposition information x(t) can be defined as mathematical equation 10.

[0137] [Mathematical Equation 10] $\begin{matrix}\begin{matrix}{{d(t)} = {\begin{bmatrix}0_{3} \\{v_{1}(t)} \\\vdots \\{v_{n}(t)}\end{bmatrix} = {\begin{bmatrix}0_{3} \\{q_{1}(t)} \\\vdots \\{q_{n}(t)}\end{bmatrix} \ominus \begin{bmatrix}0_{3} \\{q_{1}^{r}(t)} \\\vdots \\{q_{n}^{r}(t)}\end{bmatrix}}}} \\{= \begin{bmatrix}0_{3} \\{\ln \left( {\left( {q_{1}^{r}(t)} \right)^{- 1}{q_{1}(t)}} \right)} \\\vdots \\{\ln \left( {\left( {q_{n}^{r}(t)} \right)^{- 1}{q_{n}(t)}} \right)}\end{bmatrix}}\end{matrix} & \left\lbrack {{Mathematical}\quad {Equation}\quad 10} \right\rbrack\end{matrix}$

[0138] Here, v_(i)(t) ∈R³ is a rotation vector of i(1≦i≦n)^(th)articulation. Accordingly, a new position can be calculated by adding amotion displacement map with the reference motions x(t)=x^(r)(t) ⊕d(t).

[0139] [Mathematical Equation 11] $\begin{matrix}\begin{matrix}{{x(t)} = {\begin{bmatrix}0_{3} \\{q_{1}^{r}(t)} \\\vdots \\{q_{n}^{r}(t)}\end{bmatrix} \oplus \begin{bmatrix}0_{3} \\{v_{1}(t)} \\\vdots \\{v_{n}(t)}\end{bmatrix}}} \\{= \begin{bmatrix}0_{3} \\{{q_{1}^{r}(t)}\quad {\exp \left( {v_{1}(t)} \right)}} \\\vdots \\{{q_{n}^{r}(t)}\quad \exp \quad \left( {v_{n}(t)} \right)}\end{bmatrix}}\end{matrix} & \left\lbrack {{Mathematical}\quad {Equation}\quad 11} \right\rbrack\end{matrix}$

[0140] From an articulation angle displacement d(t) of the frame numberi, a soft motion displacement map d(t) which approximates d(i) for all ican be obtained from the multi-level B-spline interpolation.

[0141] Unlike the local B-spline interpolation, a good approximation ofthe total shape and local characteristics can be obtained using themulti-level characteristics of the multi-level B-spline interpolation.

[0142] Basically, the approximation error on a thin level is propagatedto the whole frames and it is later controlled at a more accurate level.

[0143] Hence, the knot array which corresponds to the thinnest levelprovides a rough approximation on the total shape and afterwards a closeapproximation can be obtained using B splines which have a more accurateknot array.

[0144] By adding d(t) to the reference motions, the position informationxc(t) which is consisted of the final articulation angles can beobtained using mathematical equation 12.

x ^(c)(t)=x ^(r)(t)⊕d(t)   [Mathematical Equation 12]

[0145] This process is explained in FIG. 5 which shows a process ofmotion smoothing against the reconstructed articulation angles.

[0146] Each of the curves in FIG. 5 represents a value of unitquaternion. According to its order, it represents the reference motionand reconstructed articulation rotating angle→the difference ofarticulation rotating angle→the difference of smoothed articulationrotating angle→the final articulation rotating angle.

[0147] According to a properly selected knot array interval, anapproximation process can approximates noisy added inputs into a desiredaccuracy level for smooth movements.

[0148] 4. Inferencing of Top Articulation Position (S300)

[0149] The final motion m(t) can be obtained by adding xc(t) to motiondisplacement map d(t)=(p₁(t), 03, . . . ,03)^(T) which lists the toparticulation position information.

[0150] Here, p₁(t) is the top articulation position trajectoryrepresented in a global coordinate.

[0151] Since the movements were estimated according the performer usinguncalibrated cameras, it is difficult to ascertain a top articulationposition information in the images. Hence, the top articulation positiontrajectory obtained using the kinematics limiting conditions andcharacteristics of reference motions.

[0152] The first case is when a mutual reaction exists. A walking motioncan be regarded as a typical first case since the feet of a performertouches the ground.

[0153] The second case is when no mutual reaction exists. A runningmotion can be regarded as a typical second case.

[0154] 4-1 When a Mutual Reaction Exists With the SurroundingEnvironment.

[0155] When a kick motion of a soccer player is examined, the toparticulation position trajectory of the reference motion (left picture)is very different since the reconstructed motion (right picture) is verydynamic.

[0156] Hence, after applying the kinematics limiting condition whichstates that a jumping foot should touch the ground to the toparticulation of the reference motion and the height should be changed.

[0157] After calculating the distance between the jumping feet and thesurface of the ground in the motion m(t)=x^(c)(t)⊕(p^(r) _(l)(t), 03, .. . ,03)^(T) for all i under the limiting conditions and producing themotion displacement map d(I) using the multi-level B-splineinterpolation, the final motions are produced using the interactivemotion editing method (J. Lee and S. Y. Shin. A hierarchical approach tointeractive motion editing for human-like figures. Computer Graphics(Proc. SIGGRAPH '99), 1999.) using m(t)=x^(c)(t)⊕(d(t), 03, . . .,03)^(T) as a initial estimated value.

[0158] Here, x^(c)(t) and p^(r) _(l)(t) are the reconstructedarticulation angles and time transformed reference motions of toparticulation trajectory respectively.

[0159] 4-2 When No Mutual Reaction Exists With the SurroundingEnvironment

[0160] Unlike the case explained as above, there exists no mutualreaction with the surrounding environment in case of a jumping motion.

[0161] In this case, the top articulation trajectory of the objectivemotion is produced using the dynamic characteristics of the referencemotions.

[0162] If a multi-articulated body is assumed to be a collection ofrigid bodies, then the center of gravity trajectory of themulti-articulated body when under the influence of gravity only withouta mutual reaction can be shown to be a soft parabola.

[0163] The center of gravity trajectory of the reference motions can berepresented as mathematical equation 13.

[0164] [Mathematical Equation 13] $\begin{matrix}{{{cog}^{r}(t)} = {{p_{1}^{r}(t)} + \frac{\sum\limits_{i = 1}^{n}{m_{i}{{\overset{\sim}{p}}_{i}^{r}(t)}}}{\sum\limits_{i = 1}^{n}m_{i}}}} & \left\lbrack {{Mathematical}\quad {Equation}\quad 13} \right\rbrack\end{matrix}$

[0165] Here, p^(r) _(l)(t) (1≦i≦n) and m_(i)(1≦i≦n) represent a relativelocation vector and weight of top articulation position pi(t) to itharticulation location respectively.

[0166] As explained above, since the reference motions are partiallinearly time transformed according to the time, the center of gravitytrajectory of the objective motions can also be partial linearly timetransformed according mathematical equation 13.

[0167] [Mathematical Equation 14] $\begin{matrix}{{{cog}(t)} = {{s\quad {{cog}^{r}(t)}} = {{p(t)} + \frac{\sum\limits_{i = 1}^{n}{{\overset{\sim}{p}}_{i}(t)}}{\sum\limits_{i = 1}^{n}m_{t}}}}} & \left\lbrack {{Mathematical}\quad {Equation}\quad 14} \right\rbrack\end{matrix}$

[0168] Here, S and p_(i)(t) (1≦i≦n) represent a scaling factor andrelative location of ith articulation of the objective motion from thetop articulation position for time transformation respectively.

[0169] p_(i)(t) can be obtained from a reconstructed articulation anglexc(t). Accordingly, the final top articulation position p_(i)(t) can beobtained from mathematical equation 15 using mathematical equation 13and mathematical equation 14.

[0170] [Mathematical Equation 15] $\begin{matrix}{{p_{1}(t)} = {{s\quad {p^{r}(t)}} + {s\left( \frac{\sum\limits_{i = 1}^{n}{m_{i}\left( {{{\overset{\sim}{p}}_{i}^{r}(t)} - {{\overset{\sim}{p}}_{i}(t)}} \right)}}{\sum\limits_{i = 1}^{n}m_{i}} \right)}}} & \left\lbrack {{Mathematical}\quad {Equation}\quad 15} \right\rbrack\end{matrix}$

[0171] As can be seen from FIG. 7, the center of gravity trajectory ofthe reconstructed motion (right picture) obtain from the previouslymentioned method is a soft parabola.

[0172] 5. Experimental Results

[0173] Hereinafter, the experimental results of a shooting motion of thefoot and heading motion will be explained according to the reconstructedimages of the present invention.

[0174] The human model utilized in the experiment has 40 degrees offreedom including 6 for location and direction of top articulationposition, 3 for chest and neck and 7 for hands and feet.

[0175] The motion clipping used in the reference motions are sampled at60 Hz and the keytimes are directly assigned.

[0176] The used video is a standard video commonly used in publicbroadcasting and the locations of keytimes and feature points aredirectly assigned.

[0177] Table 1 (motion library comprising reference motions) shows thetypes of captured motions for the experiment. TABLE 1 category ballplacement category head direction (kicks) place volley(h)† volleysliding (headings) front left right instep ◯ ◯ ◯ — stand ◯ ◯ ◯ inside ◯◯ — ◯ jump (single foot) ◯ ◯ ◯ outside ◯ ◯ ◯ ◯ jump (both feet) ◯ ◯ ◯toe ◯ — — — stand back ◯ ◯ ◯ hill ◯ — — — jump back ◯ ◯ ◯ overhead — — ◯— cut ◯ — — — turning ◯ — — —

[0178] For the shooting motions in a soccer game, the motions in thetable 1 are sufficient.

[0179] Table 1 represent the shooting motions (

means a half volley) of a soccer player, these motions are classifiedinto relative positions of the foot and head with respect to theposition of the ball.

[0180] Each of the motion clips is captured by an optical capturingequipment and the duration is about two to three seconds.

[0181] The suggested reconstruction method is implemented by TGSOpenInventor™ which provides a C++ 3D graphics interface under MSWindows XP™.

[0182] The experiment is carried out by a personal computer with a 512MB main memory based on PentiumIV™.

[0183] 5-1. Shooting Motion by a Foot

[0184]FIG. 8 shows the reconstruction of a shooting motion by a soccerplayer using the library motion ‘place instep kick’ in Table 1.

[0185] The top is an input image which shows the feature points, themiddle is the reference motions and the bottom is reconstructed motions.

[0186] The used image clips consist of 51 frames and the camera isplaced on the right hand side of the performer.

[0187] The number of keytimes assigned here are four for left foot andtwo for right foot. Table 2 (error analysis data) show errors from therelative location difference with respect to the top articulationposition like shown in mathematical equation 16. TABLE 2 kick motion (51frames) heading motion (37 frames) min. max. avg. min. max. avg.original 0.0331 0.3049 0.1040 0.0116 0.1918 0.0846 timewarped 0.02720.1624 0.0915 0.0100 0.1397 0.0710 reconstructed (knot spacing:1) 0.00110.0100 0.0043 0.0005 0.0100 0.0051 reconstructed (knot spacing:4) 0.00130.0227 0.0088 0.0008 0.0159 0.0058

[0188] Table 2 shows the maximum, minimum and average error valuesagainst all frames and these errors were measured by a standardcoordinate in the range [−1.1].

[0189] [Mathematical Equation 16] $\begin{matrix}{{e(t)} = {\sum\limits_{i = 1}^{n}{{{{\overset{\_}{p}}_{i}(t)} - {P_{c}{f_{i}\left( {x^{c}(t)} \right)}}}}^{2}}} & \left\lbrack {{Mathematical}\quad {Equation}\quad 16} \right\rbrack\end{matrix}$

[0190] Here, {overscore (p)}_(i)(t) and x^(c)(t) represent 2D featurepoints indicated in 3D location and images of i^(th) articulation.

[0191] As can be seen from Table 2, the time transformation process cansignificantly minimize the maximum errors in the total frame.

[0192] Due to the noises on the time frame of hand picked 2D featurepoints, when the motion smoothing is carried out by a multi-levelB-spline interpolation with the minimum knot interval value of 4, thetotal motion can be soft and lie within of the error range.

[0193] The total reconstruction time including the hand operation isabout 30 minutes.

[0194] 5-2 Goal Scoring With a Head Motion

[0195]FIG. 9 shows the reconstruction of a heading motion by a soccerplayer using the library motion ‘left jump heading with both feet’ inTable 1.

[0196] The top is an input image, the middle is the reference motionsand the bottom is reconstructed motions.

[0197] The used image clips consist of 37 frames and the camera isplaced at the rear of the performer.

[0198] The number of keytimes assigned here are two for each foot andone for the head. Since the total motion corresponds to a jumpingmotion, the trajectory of top articulation is produced by using thecenter of gravity trajectory of the reference motions.

[0199] As explained so far, the present invention comprises a detailinformation on the total processes from images to motions. Theseprocesses which comprise a plurality of steps are very simple toimplement and straight forward. Especially the total movement of amotion can be effectively inferenced.

[0200] Using the present invention, first, the dynamic movement ofplayers in a sports game can effectively obtained, hence, the presentinvention can be used for constructing replaying sporting motions.Second, similar but diverse human motions can easily be produced from afew captured motions, hence, the production cost, i.e., animationproduction can be reduced.

What is claimed is:
 1. A motion reconstruction method from inter-framefeature correspondences of a singular video stream using a motionlibrary, comprising: a time transformation step which selects objectivemotions and similar motions (reference motions) from the motion librarythat contains various motion information, and transforms the timecorrespondence between the reference motions and input images using thetime axis of the reference motions; an articulation reconstruction stepwhich ascertains the articulation angles and makes the relative locationrelationship between the 2D feature points obtained from said inputimages and articulation angles to coincide; and a top articulationposition inferencing step which produces an appropriate top articulationmovement from the continues position information obtained from saidabove steps, kinematics information obtained from the input images anddynamical information obtained from said reference motions.
 2. Themethod as claimed in claim 1, wherein keytimes (a collection of pointson which the user starts or finishes mutual reactions with thesurrounding environment in the images and reference motions) which isassigned by the user in the input images and reference motions are used.3. The method as claimed in claim 2, wherein the use of said keytimesare for producing motions which are most similar to a collection ofkeytimes of the input images by transforming the reference motions bythe time axis.
 4. The method as claimed in claim 2 or claim 3, whereinwhen said collection of keytimes defined in the input images andreference motions are, K_(i)={t_(i,1), . . . . ,t_(i,c)} and {overscore(K)}_(i)={{overscore (t)}_(i,1), . . . . , {overscore (t)}_(i,c)} forith articulation, the parameters of the reference motions m(t) aretransformed by mathematical equation
 17. [Mathematical Equation 17]$\begin{matrix}{{{t^{\prime}(t)} = {{\overset{\_}{t}}_{i,k} + {\left( \frac{{\overset{\_}{t}}_{i,{k + 1}} - {\overset{\_}{t}}_{i,k}}{t_{i,{k + 1}} - t_{i,k}} \right)\left( {t - t_{i,k}} \right)}}},{t_{i,k} \leq t \leq t_{i,{k + 1}}}} & \left\lbrack {{Mathematical}\quad {Equation}\quad 17} \right\rbrack\end{matrix}$


5. A motion reconstruction method from inter-frame featurecorrespondences of a singular video stream using a motion library,comprising: a kinematically limiting conditions assigning step whichmakes the location of projected articulation of a 3D multi-articulatedbody to coincide to the 2D feature points; an objective functionminimization step which minimizes the objective function by selectingthe motions that are most similar to the reference motions; anarticulation reconstruction step that ascertains the position shapeswhich minimizes the objective function under the kinematically limitingcondition; and a smoothing step that calculates the difference betweensaid reference motions and articulation angles which are reconstructedfor each frame, selects a motion displacement map by getting anapproximation using the multi-level B-spline interpolation andtransforms the reference motions using said motion displacement map toget the final motion.
 6. The method as claimed in claim 5, wherein saidkinematically limiting conditions can be represented by mathematicalequation 18 when based on camera parameters. ∥{overscore (p)} _(i)(t)−P_(c) f _(i)(x(t))∥=0   [Mathematical Equation 18] Here, fi (•) is aforward kinematics function for ith articulation, Pc is a projectiontransformation array consisted of a camera parameter c and {overscore(p)}_(i)(t) is the location of projected articulation image for itharticulation.
 7. The method as claimed in claim 6, wherein saidparameter c is represented by camera direction and focal distance(r_(x), r_(y), r_(z), γ) respectively.
 8. The method as claimed in claim5, wherein position x(t) is obtained by minimizing the objectivefunction in mathematical equation
 19. g(x(t))=dist(x ^(r) ,x(t))  [Mathematical Equation 19] Here, xr(t) is a position informationconsisted of the reference motions and articulation angles at the timet, and dist(•) is a function which represent the distance between twodifferent directions.
 9. The method as claimed in claim 5, wherein saidminimization problem is solved by the conjugate gradient method in whichone of the two parameters is fixed while the other is being derived. 10.The method as claimed in claim 5, wherein when said motion displacementmap is d(t) and x^(r)(t) is a reference motion, the new motioninformation x(t) is obtained by adding x(t)=x^(r)(t)⊕d(t) andcalculating from mathematical equation
 20. [Mathematical Equation 20]${x(t)} = {{\begin{bmatrix}0_{3} \\{q_{1}^{r}(t)} \\\vdots \\{q_{n}^{r}(t)}\end{bmatrix} \oplus \begin{bmatrix}0_{3} \\{v_{1}(t)} \\\vdots \\{v_{n}(t)}\end{bmatrix}} = \begin{bmatrix}0_{3} \\{{q_{1}^{r}(t)}{\exp \left( {v_{1}(t)} \right)}} \\\vdots \\{{q_{n}^{r}(t)}{\exp \left( {v_{n}(t)} \right)}}\end{bmatrix}}$


11. The method as claimed in claim 1, wherein when the motion has amutual reaction with the surrounding environment in said toparticulation position inferencing step, the height is changed afterapplying the kinematics limiting conditions to the top articulation ofthe reference motion.
 12. The method as claimed in claim 1, wherein whenthe motion has no mutual reaction with the surrounding environment insaid top articulation position inferencing step, the final toparticulation position of the objective motion is obtained bymathematical equation 21 using the dynamical characteristics of saidreference motion. [Mathematical Equation 21]${p_{1}(t)} = {{s\quad {p^{r}(t)}} + {s\left( \frac{\sum\limits_{i = 1}^{n}\quad {m_{i}\left( {{{\overset{\sim}{p}}_{i}^{r}(t)} - {{\overset{\sim}{p}}_{i}(t)}} \right)}}{\sum\limits_{i = 1}^{n}\quad m_{i}} \right)}}$

Here, S and p_(i)(t) (1≦i≦n) represent a scaling factor and relativelocation of ith articulation of the objective motion from the toparticulation position for time transformation respectively.